cient Learning of Semi - structured Data from Queries

نویسندگان

  • Hiroki Arimura
  • Hiroshi Sakamoto
چکیده

This paper studies the polynomial-time learnability of the classes of ordered gapped tree patterns (OGT) and ordered gapped forests (OGF) under the into-matching semantics in the query learning model of Angluin. The class OGT is a model of semi-structured database query languages, and a generalization of both the class of ordered/unordered tree pattern languages and the class of non-erasing regular pattern languages. First, we present a polynomial time learning algorithm for OGT, the subclass of OGT without repeated tree variables, using equivalence queries and membership queries. By extending this algorithm, we present polynomial time learning algorithms for the classes -OGF of forests without repeated variables and OGT of trees with repeated variables using equivalence queries and subset queries. We also give representation-independent hardness results which indicate that both of equivalence and membership queries are necessary to learn -OGT.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Learning of Semi-structured Data from Queries

This paper studies the learning complexity of classes of structured patterns for HTML/ XML-trees in the query learning framework of Angluin. We present polynomial time learning algorithms for ordered gapped tree patterns, OGT, and ordered gapped forests, OGF, under the into-matching semantics using equivalence queries and subset queries. As a corollary, the learnability with equivalence and mem...

متن کامل

Enhancing passage retrieval in log files by query expansion based on explicit and pseudo relevance feedback

Passage retrieval is usually defined as the task of searching for passages which may contain the answer for a given query. While these approaches are very e cient when dealing with texts, applied to log files (i.e. semi-structured data containing both numerical and symbolic information) they usually provide irrelevant or useless results. Nevertheless one appealing way for improving the results ...

متن کامل

Normalization and Learning of Transducers on Trees and Words. (Normalisation et Apprentissage de Transducteurs d'Arbres et de Mots)

Since the arrival of the Web, various kinds of semi-structured data formats were introduced in the areas of computer science and technology relevant for the Web, such as document processing, database management, knowledge representation, and information exchange. The most recent technologies for processing semi-structured data rely on the formats Json and Rdf. The main questions there are how t...

متن کامل

Georeferencing Semi-Structured Place-Based Web Resources Using Machine Learning

In recent years, the shared content on the web has had significant growth. A great part of these information are publicly available in the form of semi-strunctured data. Moreover, a significant amount of these information are related to place. Such types of information refer to a location on the earth, however, they do not contain any explicit coordinates. In this research, we tried to georefer...

متن کامل

Evaluation of Top-k Queries over Structured and Semi-structured Data

Evaluation of Top-k Queries over Structured and Semi-structured Data

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007